KERNAL DENSITY METHODS

This is an jupyter notebook. Lectures about Python, useful both for beginners and experts, can be found at http://scipy-lectures.github.io.

Open the notebook by (1) copying this file into a directory, (2) in that directory typing jupyter-notebook and (3) selecting the notebook.

Written By: Riddhish Bhalodia

In this exercise, we will learn about different kernal methods for pattern recognition. Reader should be familiar with basic machine learning, estimation theory, linear algebra and a good command over regression.

Dependencies: numpy, matplotlib

Dual Representation

Dual Representation is an alternate way of formulating many linear models used for regression and classification problems. Here the predictions/ estimates are based on the linear combination of a kernal function evaluated at the training points. To get to what exactly kernal functions are and how they are formulated we consider the linear regression least squares error

$$J(\textbf{w}) = \frac{1}{2} \sum \limits_{n=1}^N \{\textbf{w}^T \phi(\textbf{x}_n) - \textbf{t}_n\}^2 + \frac{\lambda}{2}\textbf{w}^T\textbf{w} \quad \quad \quad (1)$$

By setting the gradient of the error term (1) wrt the vector $\textbf{w}$ to be zero, we get the solution as linear conbination of the vectors $\phi (\textbf{x}_n)$ where the weights are a function of $\textbf{w}$, as follows.

$$\textbf{w} = -\frac{1}{\lambda} \sum \limits_{n=1}^N \{\textbf{w}^T \phi(\textbf{x}_n) - \textbf{t}_n\}\phi (\textbf{x}_n) = \pmb{\phi}^T\textbf{a} \quad \quad \quad (2)$$

The $\pmb{\phi}^T$ is called a Design Matrix and it's N'th row is given by $\phi(\textbf{x}_n)$. The vector $\textbf{a} = (a_1, a_2, \cdots, a_N)^T$ is given by

$$a_n = -\frac{1}{\lambda} \{\textbf{w}^T \phi(\textbf{x}_n) - \textbf{t}_n\} \quad \quad \quad (3)$$

Now, we would like to represent the $J(\textbf{w})$ in terms of $\textbf{a}$ and hence we substitue the equation (3) in equation (1). So we get

$$J(\textbf{a}) = \frac{1}{2}\pmb{a^T \phi \phi^T \phi \phi^T a} - \pmb{a^T \phi \phi^T t} + \frac{1}{2}\pmb{t^T t} + \frac{\lambda}{2}\pmb{a^T \phi \phi^T a} \quad \quad \quad (4)$$

Now we define something called a Gram Matrix as

$$\textbf{K} = \pmb{\phi \phi^T} \quad \quad \quad (5)$$

Gram martix is a symmetric NxN matrix whose each element is given by

$$K_{nm} = \phi(\textbf{x}_n)^T\phi(\textbf{x}_m) = k(\textbf{x}_n, \textbf{x}_m) \quad \quad \quad (6)$$

Here, $k(\textbf{x}_n, \textbf{x}_m)$ is called the kernal function which is defined by the inner profuct of feature space, as follows.

$$k(\textbf{x}, \textbf{x}') = \phi(\textbf{x})^T\phi(\textbf{x}') \quad \quad \quad (7)$$

Hence, we can see that the equation (4) can completely be written in ter



In [ ]: